Preprocessing Methods for Word Alignment

نویسنده

  • Tsuyoshi Okita
چکیده

This paper compares four preprocessing approaches for word alignment: 1) sentence removal approach, 2) good points approach, 3) sentence duplication approach, and 4) removal of doubtful alignments approach. Two are statistically motivated and the other two are heuristics. We focus on the ability of a word aligner of IBM Model 4 that it should often face with troubles when handling paraphrase, multi-words and non-literal translation. We assume that IBM Model 4 works 90% correct, while only around 5% wrong.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Word Alignment with Mechanical Turk

Word alignment is an important preprocessing step for machine translation. The project aims at incorporating manual alignments from Amazon Mechanical Turk (MTurk) to help improve word alignment quality. As a global crowdsourcing service, MTurk can provide flexible and abundant labor force and therefore reduce the cost of obtaining labels. An easyto-use interface is developed to simplify the lab...

متن کامل

Combination of Statistical Word Alignments Based on Multiple Preprocessing Schemes

We present an approach to using multiple preprocessing schemes to improve statistical word alignments. We show a relative reduction of alignment error rate of about 38%.

متن کامل

Iterative reordering and word alignment for statistical MT

Word alignment is necessary for statistical machine translation (SMT), and reordering as a preprocessing step has been shown to improve SMT for many language pairs. In this initial study we investigate if both word alignment and reordering can be improved by iterating these two steps, since they both depend on each other. Overall no consistent improvements were seen on the translation task, but...

متن کامل

The Karlsruhe Institute for Technology Translation System for the ACL-WMT 2010

This paper describes our phrase-based Statistical Machine Translation (SMT) system for the WMT10 Translation Task. We submitted translations for the German to English and English to German translation tasks. Compared to state-of-the-art phrase-based systems we preformed additional preprocessing and used a discriminative word alignment approach. The word reordering was modeled using POS informat...

متن کامل

Consensus versus Expertise : A Case Study of Word Alignment with Mechanical Turk

Word alignment is an important preprocessing step for machine translation. The project aims at incorporating manual alignments from Amazon Mechanical Turk (MTurk) to help improve word alignment quality. As a global crowdsourcing service, MTurk can provide flexible and abundant labor force and therefore reduce the cost of obtaining labels. An easyto-use interface is developed to simplify the lab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012